Skip to main content

PHD Students

Zhaofeng Lin

Cohort: Cohort 5
Institution: TCD

Project Title

Speech recognition is central to technology such as Siri and Alexa, and works well in controlled environments. However, machines still lag behind humans in our ability to seamlessly interpret multiple cues such as facial expression, gesture, word choice, mouth movements to understand speech in more noisy or challenging environments. Humans also have a remarkable ability to adapt on the fly to changing circumstances in a single conversation, such as intermittent noise or speakers with significantly different speaking styles or accents. These two skills make human speech recognition extremely robust and versatile. This PhD seeks to develop deep learning architectures that can better integrate the different modalities of speech and also be deployed in an agile manner, allowing continuous adaptation to external factors. These two aspects are inherently intertwined and are key to developing next-generation speech recognition solutions.

Supervision Team

Speech recognition is central to technology such as Siri and Alexa, and works well in controlled environments. However, machines still lag behind humans in our ability to seamlessly interpret multiple cues such as facial expression, gesture, word choice, mouth movements to understand speech in more noisy or challenging environments. Humans also have a remarkable ability to adapt on the fly to changing circumstances in a single conversation, such as intermittent noise or speakers with significantly different speaking styles or accents. These two skills make human speech recognition extremely robust and versatile. This PhD seeks to develop deep learning architectures that can better integrate the different modalities of speech and also be deployed in an agile manner, allowing continuous adaptation to external factors. These two aspects are inherently intertwined and are key to developing next-generation speech recognition solutions.

Description

Speech recognition is central to technology such as Siri and Alexa, and works well in controlled environments. However, machines still lag behind humans in our ability to seamlessly interpret multiple cues such as facial expression, gesture, word choice, mouth movements to understand speech in more noisy or challenging environments. Humans also have a remarkable ability to adapt on the fly to changing circumstances in a single conversation, such as intermittent noise or speakers with significantly different speaking styles or accents. These two skills make human speech recognition extremely robust and versatile. This PhD seeks to develop deep learning architectures that can better integrate the different modalities of speech and also be deployed in an agile manner, allowing continuous adaptation to external factors. These two aspects are inherently intertwined and are key to developing next-generation speech recognition solutions.

d-real Partners